feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249
Open
asherfink wants to merge 1 commit into
Open
feat(AGX1-275): per-RPC task permission rewire and 404/403 wrap#249asherfink wants to merge 1 commit into
asherfink wants to merge 1 commit into
Conversation
5 tasks
4 tasks
dm36
added a commit
that referenced
this pull request
May 27, 2026
…se and two-factor mutations Mirrors AGX1-275 (PR #249) for agent_api_keys. Wires Spark AuthZ checks into every api_key route, collapses denials to 404 (so name/id probes can't distinguish "present in another tenant" from "absent"), and relies on SpiceDB's transitive expansion of api_key.{update,delete} (= editor & parent_agent->update & tenant_gate) for two-factor mutations rather than issuing two explicit checks at the route layer. - src/utils/agent_api_key_authorization.py (new): _check_api_key_or_collapse_to_404 — catches AuthorizationError, raises ItemDoesNotExist. Same shape as Asher's task helper. - src/utils/authorization_shortcuts.py: DAuthorizedId routes AgentexResourceType.api_key through the wrap. (DAuthorizedName isn't used for api_keys; the name lookup is (agent_id, name, api_key_type), not a single globally-unique path param — the route handlers call the collapse helper inline instead.) - src/api/routes/agent_api_keys.py: * POST: explicit agent.update on parent (no api_key resource yet). * GET list: DAuthorizedResourceIds + filter; None passes through. * GET /name/{name}: inline collapse helper. * GET /{id}: DAuthorizedId(api_key, read). * DELETE /{id}: DAuthorizedId(api_key, delete). Two-factor via SpiceDB schema (api_key.delete expands to parent_agent.update); no second route-layer check. * DELETE /name/{api_key_name}: inline collapse helper. - tests/unit/api/test_agent_api_keys_authz.py (new): 12 tests, all pass. Stacked on dhruv/agx1-272-agent-api-keys-dual-write (PR A). Does NOT touch dual-write logic. Does NOT modify agentex-auth. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b936a08 to
7841696
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related work
Parent epic: AGX1-264 — per-task FGAC. Follow-ups bundled in AGX1-291.
This change is part of a 5-PR stack across 3 repos. Merge order: scaleapi/scaleapi#144783 (release sgp-authz 0.7.1) → scaleapi/agentex#353 → scaleapi/agentex#356 → #246 → this PR.
Action.CANCELcancelopregister_resourceAPI + cancel cleanupLast in the stack — this is the route-side change that actually exercises the permissions written by #246.
Summary
AuthorizedOperationTypeinstead of usingexecuteeverywhere:MESSAGE_SEND/EVENT_SEND→update,TASK_CANCEL→cancel,TASK_CREATEstayscreate.execute → updateswap acrossmessages.py,checkpoints.py,states.pyso the editor role can send messages and update checkpoints/state without needing owner. The task SpiceDB schema definespermission update = (editor + owner) & internal_tenant_gate, so leaving these onexecute(owner-only) would lock editors out of routine mutations.ItemDoesNotExist) across all surfaces — path id, query id, body id, and name routes — so callers can no longer distinguish "task present in another tenant" from "task absent" by comparing 403 vs 404.src/utils/task_authorization.py, reused from both the FastAPI dep factories and the RPC authorize hook.What changed
src/utils/task_authorization.py(new):_check_task_or_collapse_to_404(authorization, task_id, operation)— the shared wrap.src/utils/authorization_shortcuts.py:DAuthorizedId/DAuthorizedQuery/DAuthorizedBodyIdroute task checks through the wrap; their inner deps no longer take atask_repository(parameter was unused).DAuthorizedNamenow applies the wrap whenresource_type == AgentexResourceType.task— previously the name surface leaked 403 vs 404 becausetasks.nameis globally unique, so a probe checked the whole system rather than a single tenant.src/api/routes/agents.py_authorize_rpc_request: each task-resource branch routes through the wrap. TheMESSAGE_SENDblock withtask_nameis restructured to atry/elseshape so a denied-update on an existing task surfaces as 404 (it must NOT fall through to the create-fallbackexcept— that would silently promote a denied-update into a create check, which is a privilege escalation footgun).TASK_CREATEand the wildcardtask("*")checks intentionally untouched.AGX1-275 list deliverable — already covered
The AGX1-275 list-filtering deliverable (only return tasks the caller can read) is already met by existing infrastructure that this PR does not change:
DAuthorizedResourceIds(AgentexResourceType.task)atsrc/utils/authorization_shortcuts.py:130resolves the per-principal accessible task id set.src/api/routes/tasks.py:82, which intersects user filters with the accessible set before hitting the repository.No additional code is needed here — the list surface is already authorized end-to-end.
Pre-merge verification
The
execute → updateswap changes the operation literal that hits agentex-auth'scheckendpoint. The SGP gateway forwards the literal verbatim — needs confirmation that the SGP permissions backend resolvesupdate(andcancel) against the sameownerrow everyone hitting these routes today already has. Otherwise every running agent's RPCs break at deploy time.Two acceptable forms before merge:
updatecheck on an account whose only grant isowner. The wire-contract test already pins the literal; this would pin the resolution.If SGP's task permission map doesn't include
update/cancel, this PR needs to either land behind a flag (keepexecutefor SGP-routed accounts) or backfill the schema first.Tests
tests/unit/api/test_tasks_authz.py— 17/17 pass.TestPerRpcOperationRoutingtests (incl.MESSAGE_SENDcreate-fallback preserved through the restructure).TestCheckTaskOrCollapseTo404tests (allow + denied-collapses-to-404).TestDAuthorizedBodyIdTaskWraptests.TestDAuthorizedNameTaskWraptests (denied-task → 404, allow returns name, agent path unaffected).cancel → "cancel"(mirrors agentex-auth's).Out of scope / follow-ups (tracked in AGX1-291)
/agents/name/{agent_name}has the same leak shape — agent FGAC is outside AGX1-264.Greptile Summary
This PR is the final step in the AGX1-264 per-task FGAC stack: it rewires each task-resource RPC to the correct
AuthorizedOperationType(updatefor message/event sends,cancelfor task cancel,createunchanged), extracts a shared_check_task_or_collapse_to_404helper that converts every auth denial into a 404, and applies that helper across all four authorization-shortcut surfaces (path id, query id, body id, and name).MESSAGE_SEND/EVENT_SEND→update,TASK_CANCEL→cancel, plus the sameexecute → updateswap on all message and checkpoint mutation endpoints so editor-role holders can perform routine mutations without needing owner.task.namesurface) now return 404, eliminating cross-tenant existence probes. The design trade-off (in-tenant users see 404 on permission-gap updates instead of 403) is documented with a tracked TODO.DAuthorizedBodyIdandDAuthorizedNametask wraps, and thecancelwire-contract literal.Confidence Score: 3/5
The logic restructuring is correct, but the PR's own description leaves an explicit pre-merge action item open: confirming that SGP resolves the new
updateandcanceloperation literals against theownergrant that all current agents hold. Without that confirmation, every in-flight agent's RPCs would silently receive 404 at deploy time.The 404-collapse logic, the try/else privilege-escalation guard in MESSAGE_SEND, and the per-RPC operation routing are all implemented correctly and are well-tested. The risk is at the cross-repo boundary: the PR author explicitly calls out that the
execute → update/cancelwire-level change requires proof that SGP's task permission schema maps both new operations to the existingownergrant before this can be safely deployed. That verification has not been documented as complete, which means the deployment could break all live agents' RPCs simultaneously.agentex/src/api/routes/agents.py — the per-RPC authorization rewire is the change most sensitive to the unconfirmed SGP schema mapping.
Important Files Changed
_naming convention.Sequence Diagram
sequenceDiagram participant C as Caller participant R as agents.py _authorize_rpc_request participant H as _check_task_or_collapse_to_404 participant A as AuthorizationService participant T as TaskService C->>R: RPC request (MESSAGE_SEND / EVENT_SEND / TASK_CANCEL) alt task_id provided R->>H: "check(task_id, update|cancel)" H->>A: authorization.check(task resource, operation) alt Allowed A-->>H: OK H-->>R: returns else AuthorizationError A-->>H: AuthorizationError H-->>R: raise ItemDoesNotExist (404) end else task_name provided (MESSAGE_SEND create-fallback path) R->>T: "get_task(name=task_name)" alt Task absent T-->>R: ItemDoesNotExist R->>A: "check(task(*), create)" else Task present T-->>R: existing_task R->>H: check(existing_task.id, update) H->>A: authorization.check(task resource, update) alt Allowed A-->>H: OK H-->>R: returns else AuthorizationError A-->>H: AuthorizationError H-->>R: raise ItemDoesNotExist (404, NOT create fallback) end end end R-->>C: proceed or 404Comments Outside Diff (1)
agentex/src/api/routes/agents.py, line 320-416 (link)execute → update/cancelgoes liveThe PR description itself flags this as a required pre-merge gate: changing the operation literal from
executetoupdate(and introducingcancel) is only safe if SGP's task permission schema already maps both operations to theownergrant that every in-flight agent holds today. If it does not, everyMESSAGE_SEND,EVENT_SEND, andTASK_CANCELRPC will receive an auth denial at deploy time — collapsing to 404 — with no obvious error signal beyond agents silently failing.The description lists two acceptable verification forms (schema doc reference or one-off integration test against the real adapter) but does not record the outcome of either. Since all affected agents in production would be impacted simultaneously, this confirmation should be documented before the PR is merged.
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat(AGX1-275): per-RPC task permission ..." | Re-trigger Greptile